88 research outputs found

    Data Mining Using Relational Database Management Systems

    Get PDF
    Software packages providing a whole set of data mining and machine learning algorithms are attractive because they allow experimentation with many kinds of algorithms in an easy setup. However, these packages are often based on main-memory data structures, limiting the amount of data they can handle. In this paper we use a relational database as secondary storage in order to eliminate this limitation. Unlike existing approaches, which often focus on optimizing a single algorithm to work with a database backend, we propose a general approach, which provides a database interface for several algorithms at once. We have taken a popular machine learning software package, Weka, and added a relational storage manager as back-tier to the system. The extension is transparent to the algorithms implemented in Weka, since it is hidden behind Weka’s standard main-memory data structure interface. Furthermore, some general mining tasks are transfered into the database system to speed up execution. We tested the extended system, refered to as WekaDB, and our results show that it achieves a much higher scalability than Weka, while providing the same output and maintaining good computation time

    Flower-CDN: A hybrid P2P overlay for Efficient Query Processing in CDN

    Get PDF
    International audienceMany websites with a large user base, e.g., websites of non-profit organizations, do not have the financial means to install large web-servers or use specialized content distribution networks such as Akamai. For those websites, we have developed Flower-CDN, a locality-aware peer-to-peer based content-distribution network in which the users that are interested in a website support the distribution of its content. The idea is that peers keep the web-pages they retrieve and later serve them to other peers that are close to them in locality. Our architecture is a hybrid between structured and unstructured networks. When a node requests a web-page from a website for the first time, a locality-aware DHT quickly finds a peer in its neighborhood that has the web-page available. Additionally, all peers in a given region that maintain content of a particular website build an unstructured content overlay. Within a content overlay peers gossip information about their content allowing the system to maintain accurate information despite failures and churn. In our detailed performance evaluation, we compare Flower-CDN with Squirrel, which is a content distribution network that is strictly based on DHTs and not locality aware. Compared to Squirrel, Flower-CDN reduces lookup latency by a factor of 9 and the transfer distance by a factor of 2. We also show that Flower-CDN's gossiping has low overhead and can be adjusted according to hit ratio requirements and bandwidth availability

    A Highly Robust P2P-CDN Under Large-Scale and Dynamic Participation

    Get PDF
    International audienceBy building a P2P Content Distribution Network (CDN), peers collaborate to distribute the content of under-provisioned websites and to serve queries for larger audiences on behalf of the websites. This can reveal very challenging, given the highly dynamic and autonomous participation of peers. Indeed, the P2P-CDN should adapt to increasing numbers of participants and provide robust algorithms under churn because these issues have a key impact on performance. Also, the distribution of tasks and content over peers should take into account their interests in order to give them proper incentives to cooperate. Finally, the routing of queries should aim peers close in locality and serve content from close-by providers to reduce network overload and achieve scalability. We have previously proposed a locality and interest-aware P2P-CDN, Flower-CDN, that lacks efficient management of robustness and scalability. In this paper, we focus on these crucial shortcomings and propose PetalUp-CDN. The performance evaluation with respect to scalability and churn shows highly significant gains

    Short paper: Cheat Detection and Prevention in P2P MOGs

    Get PDF
    International audienceIn peer-to-peer games, cheaters can easily disrupt the game state computation and dissemination, perform illegal actions and unduly gain access to sensitive information. We propose AntiCheat - a cheat detection and prevention protocol following a mutual verification approach complemented with information exposure mitigation. It is based on a randomized dynamic proxy scheme for both the dissemination and verification of actions and further reduces the information exposed to players close to the minimum required to render the game. We build a proof-of-concept prototype on top of Quake III. Experimentations with up to 48 players show that opportunities to cheat can be significantly reduced, even in the presence of colluding cheaters, while keeping good performance

    Flower-CDN: A hybrid P2P overlay for Efficient Query Processing in CDN

    Get PDF
    International audienceMany websites with a large user base, e.g., websites of non-profit organizations, do not have the financial means to install large web-servers or use specialized content distribution networks such as Akamai. For those websites, we have developed Flower-CDN, a locality-aware peer-to-peer based content-distribution network in which the users that are interested in a website support the distribution of its content. The idea is that peers keep the web-pages they retrieve and later serve them to other peers that are close to them in locality. Our architecture is a hybrid between structured and unstructured networks. When a node requests a web-page from a website for the first time, a locality-aware DHT quickly finds a peer in its neighborhood that has the web-page available. Additionally, all peers in a given region that maintain content of a particular website build an unstructured content overlay. Within a content overlay peers gossip information about their content allowing the system to maintain accurate information despite failures and churn. In our detailed performance evaluation, we compare Flower-CDN with Squirrel, which is a content distribution network that is strictly based on DHTs and not locality aware. Compared to Squirrel, Flower-CDN reduces lookup latency by a factor of 9 and the transfer distance by a factor of 2. We also show that Flower-CDN's gossiping has low overhead and can be adjusted according to hit ratio requirements and bandwidth availability

    A Highly Robust P2P-CDN Under Large-Scale and Dynamic Participation

    Get PDF
    International audienceBy building a P2P Content Distribution Network (CDN), peers collaborate to distribute the content of under-provisioned websites and to serve queries for larger audiences on behalf of the websites. This can reveal very challenging, given the highly dynamic and autonomous participation of peers. Indeed, the P2P-CDN should adapt to increasing numbers of participants and provide robust algorithms under churn because these issues have a key impact on performance. Also, the distribution of tasks and content over peers should take into account their interests in order to give them proper incentives to cooperate. Finally, the routing of queries should aim peers close in locality and serve content from close-by providers to reduce network overload and achieve scalability. We have previously proposed a locality and interest-aware P2P-CDN, Flower-CDN, that lacks efficient management of robustness and scalability. In this paper, we focus on these crucial shortcomings and propose PetalUp-CDN. The performance evaluation with respect to scalability and churn shows highly significant gains

    P2Prec: a Social-based P2P Recommendation System for Large-scale Data Sharing

    Get PDF
    We propose P2Prec, a P2P recommendation system for large-scale data sharing, which exploits friendship links. The main idea is to recommend high quality contents related to query topics and contents of friends (or friends of friends), who are expert on the topics related to the query. Expertise is implicitly deduced based on the contents stored by a user. To exploit friendship links, we rely on Friend-Of-A-Friend (FOAF) descriptions. To disseminate information about experts, we propose new semantic-based gossip algorithms that provide scalability, robustness, simplicity and load balancing. By using information retrieval techniques, we propose an efficient query routing algorithm that recommends the best peers to serve a query. In our experimental evaluation, using the TREC09 dataset and Wiki vote social network, we show that using semantic gossiping increases recall by a factor of 2.5 compared with well known random gossiping. Furthermore, P2Prec has the ability to get reasonable recall with acceptable query processing load and network traffic

    Area-based gossip multicast

    Full text link

    Distributed Data Management in 2020?

    Get PDF
    Work on distributed data management commenced shortly after the introduction of the relational model in the mid-1970's. 1970's and 1980's were very active periods for the development of distributed relational database technology, and claims were made that in the following ten years centralized databases will be an “antique curiosity” and most organizations will move toward distributed database managers [1]. That prediction has certainly become true, and all commercial DBMSs today are distributed
    corecore